Virtual decoders

ABSTRACT

In various embodiments, a Multipoint Control Unit (MCU) or another video conferencing device (e.g., an endpoint) may generate a video frame that includes video images of two or more video conferencing endpoints. The video frame may then be sent to another video conferencing device that may receive the video frame and separate the two or more video images into separate video images. In some embodiments, the video frame may be separated into its separate images using, for example, metadata sent along with the video frame. The metadata may include video image identifiers and location information (e.g., coordinates in the video frame) of the video images. In some embodiments, the separated video images may be provided to a compositor that may composite the separated video images, for example, into a new layout.

PRIORITY

This application claims the benefit of priority of U.S. ProvisionalPatent Application Ser. No. 60/945,723 titled “Virtual Decoders”, filedon Jun. 22, 2007, whose inventors are Keith C. King and Wayne E. Mock,which is hereby incorporated by reference in its entirety as thoughfully and completely set forth herein.

This application also claims the benefit of priority of U.S. ProvisionalPatent Application Ser. No. 60/945,734 titled “Videoconferencing Devicewhich Performs Multi-way Conferencing”, filed on Jun. 22, 2007, whoseinventors are Keith C. King and Wayne E. Mock, which is herebyincorporated by reference in its entirety as though fully and completelyset forth herein.

This application also claims the benefit of priority of U.S. ProvisionalPatent Application titled “Virtual Multiway Scaler Compensation”, Ser.No. 60/949,674, which was filed Jul. 13, 2007, whose inventors are KeithC. King and Wayne E. Mock, which is hereby incorporated by reference inits entirety as though fully and completely set forth herein.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates generally to conferencing and, morespecifically, to video conferencing.

2. Description of the Related Art

Video conferencing may be used to allow two or more participants atremote locations to communicate using both video and audio. Eachparticipant location may include a video conferencing endpoint forvideo/audio communication with other participants. Each videoconferencing endpoint may include a camera and microphone to collectvideo and audio from a first or local participant to send to another(remote) participant. Each video conferencing endpoint may also includea display and speaker to reproduce video and audio received from aremote participant. Each video conferencing endpoint may also be coupledto a computer system to allow additional functionality into the videoconference. For example, additional functionality may include dataconferencing (including displaying and/or modifying a document for twoor more participants during the conference).

Video conferencing involves transmitting video streams between videoconferencing endpoints. The video streams transmitted between the videoconferencing endpoints may include video frames. The video frames mayinclude pixel macroblocks that may be used to construct video images fordisplay in the video conferences. Video frame types may includeintra-frames, forward predicted frames, and bi-directional predictedframes. These frame types may involve different types of encoding anddecoding to construct video images for display. Currently, in amulti-way video conference call, a multipoint control unit (MCU) maycomposite video images received from different video conferencingendpoints onto video frames of a video stream that may be encoded andtransmitted to the various video conferencing endpoints for display.

SUMMARY OF THE INVENTION

In various embodiments, an MCU or another video conferencing device(e.g., an endpoint) may generate a video frame that includes videoimages of two or more video conferencing endpoints. The MCU may alsotransmit coordinate information along with the video frame (e.g., asmetadata). The metadata may include video image identifiers and locationinformation (e.g., coordinates in the video frame) of the video images.The video frame may then be sent to a video conferencing endpoint thatmay receive the video frame and separate the two or more video imagesinto separate video images. In some embodiments, the coordinateinformation sent along with the video frame may be used by the videoconferencing endpoint to determine the locations of the video images inthe video frame to facilitate separation of the video images.

In some embodiments, after the video conferencing endpoint separates outthe video images, the separated video images may be provided to acompositor that may composite the separated video images into a newvideo image layout. Other video images (e.g., from local video orreceived from other video conferencing endpoints) may also be compositedinto the new video image layout. In some embodiments, the new videoimage layout may be configured to be displayed (e.g., as a continuouspresence image). In some embodiments, participants at each videoconferencing endpoint may use their local video conferencing endpointsto customize their continuous presence layout. For example, participantsmay rearrange the video images and/or replace one or more video imagesin the video image layout (e.g., with a current video image from theirlocal video source).

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a video conferencing endpoint network, according toan embodiment.

FIG. 2 illustrates a video conferencing endpoint, according to anembodiment.

FIG. 3 illustrates a flowchart of a method for compositing a video imagelayout at an MCU and forming a new layout at the endpoint, according toan embodiment.

FIGS. 4 a-d illustrates an MCU transmitting a video frame comprisingmultiple video images, according to an embodiment.

FIG. 5 a illustrates an overall view of the re-compositing processincluding a virtual decoder, according to an embodiment.

FIG. 5 b illustrates several embodiments of composite video images.

FIG. 6 illustrates a video image layout, according to an embodiment.

FIG. 7 illustrates separated video images from the video image layout,according to an embodiment.

FIG. 8 illustrates a new video layout using the separated video images,according to an embodiment.

FIG. 9 illustrates a coordinate system for a video frame, according toan embodiment.

FIG. 10 illustrates various video image layouts, according to variousembodiments.

While the invention is susceptible to various modifications andalternative forms, specific embodiments thereof are shown by way ofexample in the drawings and will herein be described in detail. Itshould be understood, however, that the drawings and detaileddescription thereto are not intended to limit the invention to theparticular form disclosed, but on the contrary, the intention is tocover all modifications, equivalents, and alternatives falling withinthe spirit and scope of the present invention as defined by the appendedclaims. Note, the headings are for organizational purposes only and arenot meant to be used to limit or interpret the description or claims.Furthermore, note that the word “may” is used throughout thisapplication in a permissive sense (i.e., having the potential to, beingable to), not a mandatory sense (i.e., must). The term “include”, andderivations thereof, mean “including, but not limited to”. The term“coupled” means “directly or indirectly connected”.

DETAILED DESCRIPTION OF THE EMBODIMENTS Incorporation by Reference

U.S. patent application titled “Speakerphone”, Ser. No. 11/251,084,which was filed Oct. 14, 2005, whose inventor is William V. Oxford ishereby incorporated by reference in its entirety as though fully andcompletely set forth herein.

U.S. patent application titled “Videoconferencing System Transcoder”,Ser. No. 11/252,238, which was filed Oct. 17, 2005, whose inventors areMichael L. Kenoyer and Michael V. Jenkins, is hereby incorporated byreference in its entirety as though fully and completely set forthherein.

U.S. patent application titled “Speakerphone Supporting Video and AudioFeatures”, Ser. No. 11/251,086, which was filed Oct. 14, 2005, whoseinventors are Michael L. Kenoyer, Craig B. Malloy and Wayne E. Mock ishereby incorporated by reference in its entirety as though fully andcompletely set forth herein.

U.S. patent application titled “Video Conferencing System which AllowsEndpoints to Perform Continuous Presence Layout Selection”, Ser. No.______, which was filed Jun. 19, 2008, whose inventors are Keith C. Kingand Wayne E. Mock, is hereby incorporated by reference in its entiretyas though fully and completely set forth herein.

U.S. patent application titled “Video Conferencing Device which PerformsMulti-way Conferencing”, Ser. No. ______, which was filed Jun. 19, 2008,whose inventors are Keith C. King and Wayne E. Mock, is herebyincorporated by reference in its entirety as though fully and completelyset forth herein.

U.S. patent application titled “Video Decoder which Processes MultipleVideo Streams”, Ser. No. ______, which was filed Jun. 19, 2008, whoseinventors are Keith C. King and Wayne E. Mock, is hereby incorporated byreference in its entirety as though fully and completely set forthherein.

U.S. patent application titled “Integrated Videoconferencing System”,Ser. No. 11/405,686, which was filed Apr. 17, 2006, whose inventors areMichael L. Kenoyer, Patrick D. Vanderwilt, Craig B. Malloy, William V.Oxford, Wayne E. Mock, Jonathan I. Kaplan, and Jesse A. Fourt is herebyincorporated by reference in its entirety as though fully and completelyset forth herein.

FIG. 1 illustrates an embodiment of a video conferencing system network100. FIG. 1 illustrates an exemplary embodiment of a video conferencingsystem network 100 which may include a network 101, endpoints 103 a-103d (e.g., video conferencing systems), and a Multipoint Control Unit(MCU) 108. Although not shown in FIG. 1, the video conferencing systemnetwork 100 may also include other devices, such as gateways, a serviceprovider, conference units, and plain old telephone system (POTS)telephones, among others. Endpoints 103 a-103 d may be coupled tonetwork 101 via gateways (not shown). Gateways may each includefirewall, network address translation (NAT), packet filter, and/or proxymechanisms, among others.

The endpoints 103 a-103 d may include video conferencing systemendpoints (also referred to as “participant locations”). Each endpoint103 a-103 d may include a camera, display device, microphone, speakers,and a codec or other type of video conferencing hardware. In someembodiments, endpoints 103 a-103 d may include video and voicecommunications capabilities (e.g., video conferencing capabilities) andinclude or be coupled to various audio devices (e.g., microphones, audioinput devices, speakers, audio output devices, telephones, speakertelephones, etc.) and include or be coupled to various video devices(e.g., monitors, projectors, displays, televisions, video outputdevices, video input devices, cameras, etc.). In some embodiments,endpoints 103 a-103 d may include various ports for coupling to one ormore devices (e.g., audio devices, video devices, etc.) and/or to one ormore networks. Endpoints 103 a-103 d may each include and/or implementone or more real time protocols, e.g., session initiation protocol(SIP), H.261, H.263, H.264, H.323, among others. In an embodiment,endpoints 103 a-103 d may implement H.264 encoding for high definition(HD) video streams.

In some embodiments, the MCU 108 may function as a Multipoint ControlUnit to receive video from two or more sources (e.g., endpoints 103 a-d)and provide video (e.g., with composited video images) to two or morerecipients (e.g., endpoints). “MCU” as used herein is intended to havethe full breath of its ordinary meaning.

The network 101 may include a wide area network (WAN) such as theInternet. The network 101 may include a plurality of networks coupledtogether, e.g., one or more local area networks (LANs) coupled to theInternet. The network 101 may also include public switched telephonenetwork (PSTN). The network 101 may also include an Integrated ServicesDigital Network (ISDN) that may include or implement H.320 capabilities.In various embodiments, video and audio conferencing may be implementedover various types of networked devices.

In some embodiments, endpoints 103 a-103 d and MCU 108 may each includevarious wireless or wired communication devices that implement varioustypes of communication, such as wired Ethernet, wireless Ethernet (e.g.,IEEE 802.11), IEEE 802.16, paging logic, RF (radio frequency)communication logic, a modem, a digital subscriber line (DSL) device, acable (television) modem, an ISDN device, an ATM (asynchronous transfermode) device, a satellite transceiver device, a parallel or serial portbus interface, and/or other type of communication device or method.

In various embodiments, the methods and/or systems described may be usedto implement connectivity between or among two or more participantlocations or endpoints, each having voice and/or video devices (e.g.,endpoints 103 a-103 d and MCU 108, etc.) that communicate throughnetwork 101.

In some embodiments, the video conferencing system network 100 (e.g.,endpoints 103 a-d and MCU 108) may be designed to operate with networkinfrastructures that support T1 capabilities or less, e.g., 1.5mega-bits per second or less in one embodiment, and 2 mega-bits persecond in other embodiments. In some embodiments, other capabilities maybe supported (e.g., 6 mega-bits per second, over 10 mega-bits persecond, etc). The video conferencing system may support HD capabilities.The term “high resolution” includes displays with resolution of 1280×720pixels and higher. In one embodiment, high-definition resolution mayinclude 1280×720 progressive scans at 60 frames per second, or 1920×1080interlaced or 1920×1080 progressive. Thus, an embodiment of the presentinvention may include a video uconferencing system with HD “e.g. similarto HDTV” display capabilities using network infrastructures withbandwidths Ti capability or less. The term “high-definition” is intendedto have the full breath of its ordinary meaning and includes “highresolution”.

FIG. 2 illustrates an exemplary embodiment of a video conferencingsystem endpoint 103 (e.g., 103 a), also referred to as an endpoint orparticipant location. The endpoint 103 may have a system codec box 209to manage both a speakerphone 205/207 and the video conferencingdevices. The speakerphones 205/207 and other video conferencing systemcomponents may be coupled to the codec box 209 and may receive audioand/or video data from the system codec box 209.

In some embodiments, the endpoint 103 may include a camera 204 (e.g., anHD camera) for acquiring video images of the participant location (e.g.,of participant 214). Other cameras are also contemplated. The endpoint103 may also include a display 201 (e.g., an HDTV display). Video imagesacquired by the camera 204 may be displayed locally on the display 201and may also be encoded and transmitted to other video conferencingendpoints 103 in the video conference, e.g., through the MCU 108.

The endpoint 103 may also include a sound system 261. The sound system261 may include multiple speakers including left speakers 271, centerspeaker 273, and right speakers 275. Other numbers of speakers and otherspeaker configurations may also be used. The endpoint 103 may also useone or more speakerphones 205/207 which may be daisy chained together.

In some embodiments, the video conferencing endpoint components (e.g.,the camera 204, display 201, sound system 261, and speakerphones205/207) may be coupled to the system codec (“compressor/decompressor”)box 209. The system codec box 209 may be placed on a desk or on a floor.Other placements are also contemplated. The system codec box 209 mayreceive audio and/or video data from a network (e.g., network 101). Thesystem codec box 209 may send the audio to the speakerphone 205/207and/or sound system 261 and the video to the display 201. The receivedvideo may be HD video that is displayed on the HD display. The systemcodec box 209 may also receive video data from the camera 204 and audiodata from the speakerphones 205/207 and transmit the video and/or audiodata over the network 101 to another conferencing system. Theconferencing system may be controlled by a participant through the userinput components (e.g., buttons) on the speakerphones 205/207 and/orremote control 250. Other system interfaces may also be used.

In various embodiments, the system codec box 209 may implement a realtime transmission protocol. In some embodiments, a system codec box 209may include any system and/or method for encoding and/or decoding (e.g.,compressing and decompressing) data (e.g., audio and/or video data). Insome embodiments, the system codec box 209 may not include one or moreof the compressing/decompressing functions. In some embodiments,communication applications may use system codec box 209 to convert ananalog signal to a digital signal for transmitting over various digitalnetworks (e.g., network 101, PSTN, the Internet, etc.) and to convert areceived digital signal to an analog signal. In various embodiments,codecs may be implemented in software, hardware, or a combination ofboth. Some codecs for computer video and/or audio may include MPEG,Indeo™, and Cinepak™, among others.

In some embodiments, the endpoint 103 may capture a local image of thelocal participants and provide a video stream to the MCU 108. The MCU108 may also receive video streams from other endpoints 103. The MCU 108may create a composite image of two or more of the received videostreams and provide the composite image to each of the endpoints 103.The composite image, generated by the MCU 108, may have a certainlayout. According to one embodiment, the MCU 108 may also generatecoordinate information (or metadata) that describes the locations of thevarious images in the composite image. The endpoint 103 may use thecoordinate information to separate the plurality of images from thecomposite image, and then generate a new composite image having a newlayout, e.g., as specified by the user. The endpoint 103 may also use avirtual decoder technique in separating out the received compositeimage, as described below. In some embodiments, separating may includecopying, replacing, and/or modifying data from the video images to beused to create a new composite image.

FIG. 3 illustrates a flowchart of a method for compositing a video imagelayout at an MCU 108 and forming a new layout at the endpoint 103,according to an embodiment. It should be noted that in variousembodiments of the methods described below, one or more of the elementsdescribed may be performed concurrently, in a different order thanshown, or may be omitted entirely. Other additional elements may also beperformed as desired.

At 301, the MCU 108 may receive video images 555 from a plurality ofendpoints 103. The endpoints 103 may be remote (e.g., endpoints 103 a,103 b, and 103 c) or local (e.g., local endpoint 103 d including a localcamera) and the video images 555 may include video (e.g., from camera204) or presentations (e.g., from a Microsoft Powerpoint™ presentation).In some embodiments, the MCU 108 may use one or more decoders 409 (e.g.,three decoders 409) to decode the received video images 555 from therespective endpoints 103. For example, video packets for the videoframes with the respective received video images 555 may be assembled asthey are received (e.g., over an Internet Protocol (IP) port) into theMCU 108. FIGS. 4 a-d illustrate embodiments of MCUs 108.

In some embodiments, the MCU 108 may also receive video image layoutpreferences from one or more of the endpoints 103. For example, endpoint103 may receive a video image layout preference from one or more videoconferencing participants 214 (e.g., through a menu on an on-screeninterface) and may transmit that preference to the MCU 108. In someembodiments, a button on remote 250 may allow a video conferenceparticipant 214 to cycle through two or more available layoutpreferences. The video image layout preference may include a layout type(e.g., layout type 1001, 1003, 1005, 1007, 1009, or 1011 as seen in FIG.10). Other layout types are also possible. The video image layoutpreference may specify which endpoint's video image to place in each ofthe available layout positions (e.g., which endpoint video image shouldbe placed in the main layout position and which endpoint video imagesshould be placed in the other layout positions). In some embodiments,the MCU 108 may not receive a video image layout preference from one ormore endpoints 103. In some embodiments, the video image layoutpreference may be generated at the MCU 108. For example, software on theMCU 108 may determine which endpoint 103 has the currentspeaker/presenter and may place the corresponding video image in a mainvideo image window of the layout (e.g., with other endpoint video imagesarranged around the main video image). Other layout selection methodsare also contemplated.

In some embodiments, the MCU 108 may also be operable to receive otherinformation from the endpoints 103. For example, an endpoint 103 maysend data to the MCU 108 to move a far end camera (e.g., on anotherendpoint). The MCU 108 may subsequently transmit this information to therespective endpoint to move the far end camera.

At 303, the MCU 108 may generate a composite video image comprising twoor more video images 555 (for example, from the endpoints 103 (such asvideo images 555 a, 555 b, 555 c, and 555 d)). In some embodiments, theMCU 108 may have one or more scalers 411 (e.g., four scalers) andcompositors 413 to scale received video images 555 and composite two ormore of the video images 555 from the endpoints 103 into, for example, acomposite video image 505 (e.g. which may include one or more videoimages 555 in, for example, a continuous presence layout). Examplecomposite video images 505 are illustrated in FIG. 5 b (e.g., compositevideo images 505 a, 505 b, 505 c, and 505 d).

In some embodiments, scalers 411 may be coupled to video decoders 409(e.g., through crosspoint switch 499 shown in FIG. 4 c) that decodevideo images 555 from the various video sources (e.g., endpoints 103).The scalers 411 may scale the video images 555 after the video images555 are decoded. In some embodiments, one or more of the video images555 may not be scaled. For example, the two or more video images 555 maybe rearranged into a composite video image 505 without being scaled. Insome embodiments, the scalers 411 may be 7-15 tap scalers. The scalers411 may use linear combinations (e.g., with similar or differentcoefficients) of a plurality of pixels in a video image 555 for eachpixel scaled. Other scalers 411 are also contemplated. In someembodiments, the video images 555 may be stored in shared memory 495after being scaled. In some embodiments, the scaler 411, compositor 421,compositor 413, and scalers 415 may be included on one or more FPGAs(Field-Programmable Gate Arrays). Other processor types and processordistributions are also contemplated. For example, FPGAs and/or otherprocessors may be used for one or more other elements shown on FIG. 4 b.

In some embodiments, compositors 413 may access the video images 555(e.g., from shared memory 495) to form composited video images. In someembodiments, the MCU 108 may composite the video images 555 into therespective video image layouts requested by the endpoints 103. Forexample, the MCU 108 may composite two or more of the received videoimages 555 into a continuous presence layout (e.g., see layout types1001, 1003, 1005, 1007, 1009, or 1011 in FIG. 10). In some embodiments,the MCU 108 may form multiple composite video images according torespective received video image layout preferences.

In some embodiments, the output of the compositors 413 may again bescaled (e.g., by scalers 415 (such as scalers 415 a, 415 b, and 415 c))prior to being encoded by video encoders 453. The video data received byscalers 415 may be scaled according to the resolution requirements of arespective endpoint 103. In some embodiments, the output of a compositor413 may not be scaled prior to being encoded and transmitted to theendpoints 103. In some embodiments, the composite video image 505 may betransmitted as a video frame 507 through video stream 500 (see FIG. 5 a)to the respective endpoints 103.

In some embodiments, the MCU 108 may determine the coordinates of thevideo images 555 in the composite video image 505. For example, thecoordinate information 519 may indicate the start/stop locations of oneor more of the video images 555 in the video frame 507. This coordinateinformation 519 may be stored on the MCU 108.

At 305, the MCU 108 may transmit the composite video image 505 (whichincludes one or more video images 555) and the coordinate information519 to each endpoint 103. For example, the MCU 108 may transmit arespective composite video image 505 (with the respective coordinateinformation 519 for the respective composite video image 505) to arespective endpoint 103 (e.g., according to the video image layoutpreference received from the respective endpoint 103). The MCU 108 mayalso transmit the coordinate information 519 to the endpoints 103. Thecoordinate information 519 sent to a respective endpoint 103 may bespecific to the respective composite video image 505 sent to thatendpoint 103. The coordinate information 519 may identify the locationsof specific video images 555 in the received composite video image 505.In some embodiments, the coordinate information 519 may be transmittedas metadata 901 with the composite video image 505. The metadata 901 mayinclude coordinate information 519 for a video frame 507 with the start(and/or stop) information for a video image 555 (e.g., video imageboundaries and/or pixel start/stop points) corresponding to an endpoint103. The metadata 901 may also include attributes of each of theplurality of endpoints 103 including identifying information respectiveto the corresponding endpoints 103 for each video image 555. Otherinformation in the metadata 901 is also contemplated.

At 307, the endpoint 103 may receive the composite video image 505 andthe coordinate information 519 (e.g., in metadata 901). For example,video frame 507 comprising two or more video images 555 may be received.The video frame 507 may be received as a series of video packets 503 invideo stream 500 at decoder 515. The decoder 515 may assemble the videopackets 503 into their respective video frames 507 for furtherprocessing in virtual decoder 517.

In some embodiments, the coordinate information 519 may include a sizeof an original composite video image. For example, after determining thecoordinate information 519, the MCU 108 may need to subsequently scalethe composite video image (e.g., scale down the composite video image tobe sent over a reduced bandwidth network connection) to be sent to oneor more endpoints 103. In some embodiments, the composite video image505 may be scaled to a scaled composite video image in a scaler (e.g.,scaler 415). The coordinate information 519 may be included in metadata901 passed with a video frame 507 that includes the scaled compositevideo image. In some embodiments, the coordinate information 519 may bereformatted (e.g., at the MCU 108 or at the receiving endpoint 103) toreflect the new coordinates of one or more of the resized video imagesin the scaled composite video image. For example, when the endpoint 103receives the scaled composite video image, the endpoint 103 may detectthe actual size of the scaled composite video image and may determinethe new coordinates of one or more of the video images 555 in the scaledcomposite video image using, for example, a ratio of the size of theoriginal composite video image (which may be indicated in the coordinateinformation 519) to the size of the scaled composite video imagedetected by the endpoint 103. These new coordinates may then be used toseparate one or more of the resized images in the scaled composite videoimage (see 309 below) to use in compositing a new composite video image(see 311 below). For example, see U.S. Provisional Patent Applicationtitled “Virtual Multiway Scaler Compensation”, Ser. No. 60/949,674,which was filed Jul. 13, 2007, whose inventors are Keith C. King andWayne E. Mock, which was incorporated by reference above.

At 309, the endpoint 103 may separate the video images 555 using thecoordinate information 519. Virtual decoders 517 at one or more of theendpoints 103 may separate the composite video image 505 (e.g., acontinuous presence layout) into two or more separate video images 559.In some embodiments, the coordinate information 519 may be used to findvideo image boundaries of the video images 555 within the video frame507. In some embodiments, the coordinate information 519 may be used todetermine where the respective video images 555 start and stop in thevideo frame 507. These start/stop locations may be used by the virtualdecoder 517 to separate one or more video images 555 from the videoframe 507. For example, the separate video images may be defined and/orscaled out of the composite video image 505. For example, the coordinateinformation 519 may be used by the virtual decoder 517 to crop therespective video images 555 (e.g., video images 555 a and 555 b) in thevideo frame 507. In some embodiments, separating the video images 555may include, for example, storing the separated video images 559 inseparate locations of a memory. In some embodiments, separating thevideo images 555 may include storing start and/or stop locations of theseparated video images 559 in a memory. Other means for separating thevideo images 555 are also contemplated. For example, separating mayinclude copying, replacing, and/or modifying data from the video images555 of the composite video image 505 to be used to create a newcomposite image layout (see 311 below).

FIG. 9 illustrates an example of a use of coordinate information 519 tolocate the boundaries of four video images (e.g., video images 555 a-d)in order to separate the video images 555. For example, the User 1 videoimage 555 a may have a left boundary at 0, a top boundary at 0, a rightboundary at 639 (e.g., 639 pixels to the right of the left edge of thevideo frame 507), and a bottom boundary at 359. Similarly, the user 2video image 555 b may have a left boundary at 640, a top boundary at 0,a right boundary at 1279, and a bottom boundary at 359. Coordinateinformation 519 (e.g., boundary information) for other video images(e.g., video images 555 c and 555 d) may also be provided e.g., inmetadata 901. In some embodiments, coordinate information for arespective video image may be placed in a row of information for therespective video image. For example, row one of data in metadata 901 mayinclude a call identifier, system name, number, Internet Protocol (IP)address, and left, top, right, bottom coordinates (e.g., 0, 0, 639, and359) for a respective video image (other information may also beincluded).

While four video images 555 are shown with respect to video frame 507,it is noted that video frame 507 may include a composite video image 505with other combinations and layouts of two or more video images 555. Forexample, as seen in FIG. 5 b, composite video image 505 b may includefour video images 555 stacked on top of each other. In some embodiments,each video image of the stacked video images may be 1280 by 720 pixels(e.g., for a total size of 1280 by 2880) (other dimensions and number ofvideo images are also contemplated). In some embodiments, compositevideo image 505 c may include four images side by side. As anotherexample, the composite video image 505 d may include two video images(e.g., each 640 by 360 pixels) arranged side by side in a 1280 by 360pixel video frame. The video frame 507 may then be separated into two640 by 360 pixel video images. Other combinations and layouts are alsocontemplated. In some embodiments, the number of video images 555composited in the composite video image 505 may depend on the number ofparticipating endpoints 103 in the video conference. For example, eachparticipating endpoint may have a corresponding video image (which maybe, for example, 1280 by 720) in the composite video image 505.

FIG. 6 shows an example of a composite video image 600 with three videoimages 601, 603, and 605 originating from different endpoints 103. Thecomposite video image 600 may include a main video image 601 of theendpoint with the current speaker/presenter and two or more side videoimages (e.g., side video images 603 and 605) of other endpointsparticipating in the video conference. Coordinate information 519 forcoordinates 609, 611, and 613 may be sent with the video frame 507 andused by the virtual decoder 517 to separate the video images intoseparated video images 701, 703, and 705 (as seen in FIG. 7).

In some embodiments, the virtual decoder 517 may be implemented as asoftware abstraction on hardware such as an FPGA or other processor. Insome embodiments, one or more virtual decoders 517 may be implemented ona single ASIC (Application Specific Integrated Chip). Other virtualdecoder configurations are also contemplated. In some embodiments, aseparate processor may implement the virtual decoder 517 by issuingcommands to reprogram at least one FPGA to implement the virtual decoder517. Other configurations are also contemplated.

At 311, the endpoint 103 may generate a new composite video image based,for example, on user preference. In some embodiments, one or more of theseparated video images 559 may be provided to one or more scalers 513.The video images (including scaled video images, if any) may then beprovided to one or more compositors 515. One or more compositors 515 maycomposite the video images into a new video image layout 559 (e.g.,requested by a local participant 214 through their local endpoint 103d). In some embodiments, a local participant may cycle through thelayout offerings from the endpoint 103 (e.g., by clicking an icon tocycle to the next available layout). In some embodiments, the scalers513 and compositors 515 may be implemented in hardware or software. Insome embodiments, icon scalers may be used (e.g., if all of theendpoint's other scalers are being used).

As an example, if the main video image 701 and each of the two sidevideo images 703 and 705 are to be placed in a video image layout withequal sized video images, the main video image 701 may be scaled downand the two side video images 703 and 705 may be scaled up (or notscaled at all). Other scaling combinations are also contemplated. Insome embodiments, the separated video images may not be scaled (e.g.,the separated video images may be only rearranged).

In some embodiments, the endpoint 103 may form a new composite videoimage that includes its current local video image 555 e (see FIG. 5 a)as one of the video images. In some embodiments, the layout of thereceived video image layout and the new video image layout may be thesame. In some embodiments, the current local video image 555 e may bemore current than the local video image 555 c originally sent to the MCU108 and received in the composite video image 505.

At 313, the endpoint 103 may display the new composite video image. FIG.8 illustrates an example of a new video image layout with three similarsized video images 801, 803, and 805 on display. FIG. 10 illustratesother possible video image layouts, according to various embodiments.Other video image layouts are also contemplated. In some embodiments,the metadata 901 may be displayed (e.g., with each respective videoimage in the video image layout).

Embodiments of a subset or all (and portions or all) of the above may beimplemented by program instructions stored in a memory medium or carriermedium and executed by a processor. A memory medium may include any ofvarious types of memory devices or storage devices. The term “memorymedium” is intended to include an installation medium, e.g., a CompactDisc Read Only Memory (CD-ROM), floppy disks, or tape device; a computersystem memory or random access memory such as Dynamic Random AccessMemory (DRAM), Double Data Rate Random Access Memory (DDR RAM), StaticRandom Access Memory (SRAM), Extended Data Out Random Access Memory (EDORAM), Rambus Random Access Memory (RDRAM), etc.; or a non-volatilememory such as a magnetic media, e.g., a hard drive, or optical storage.The memory medium may include other types of memory as well, orcombinations thereof. In addition, the memory medium may be located in afirst computer in which the programs are executed, or may be located ina second different computer that connects to the first computer over anetwork, such as the Internet. In the latter instance, the secondcomputer may provide program instructions to the first computer forexecution. The term “memory medium” may include two or more memorymediums that may reside in different locations, e.g., in differentcomputers that are connected over a network.

In some embodiments, a computer system at a respective participantlocation may include a memory medium(s) on which one or more computerprograms or software components according to one embodiment of thepresent invention may be stored. For example, the memory medium maystore one or more programs that are executable to perform the methodsdescribed herein. The memory medium may also store operating systemsoftware, as well as other software for operation of the computersystem.

Further modifications and alternative embodiments of various aspects ofthe invention may be apparent to those skilled in the art in view ofthis description. Accordingly, this description is to be construed asillustrative only and is for the purpose of teaching those skilled inthe art the general manner of carrying out the invention. It is to beunderstood that the forms of the invention shown and described hereinare to be taken as embodiments. Elements and materials may besubstituted for those illustrated and described herein, parts andprocesses may be reversed, and certain features of the invention may beutilized independently, all as would be apparent to one skilled in theart after having the benefit of this description of the invention.Changes may be made in the elements described herein without departingfrom the spirit and scope of the invention as described in the followingclaims.

1. A computer implemented method, comprising: receiving a video framecomprising two or more video images, wherein at least two of the two ormore video images are from different video conferencing endpoints; andseparating the two or more video images into separate video images. 2.The computer implemented method of claim 1, wherein the video frame isreceived from a multipoint control unit (MCU).
 3. The computerimplemented method of claim 1, wherein separating the video images isperformed through the use of coordinate information.
 4. The computerimplemented method of claim 3, wherein the coordinate information isreceived in metadata with the video frame.
 5. The computer implementedmethod of claim 3, wherein separating the video images is performed byseparating the video images at their respective boundaries, wherein theboundaries are identified through the coordinate information receivedwith the video frame.
 6. The computer implemented method of claim 3,wherein the coordinate information indicates a top boundary, a leftboundary, a right boundary, and a bottom boundary of at least one videoimage of the two or more video images.
 7. The computer implementedmethod of claim 5, wherein indicating a top boundary comprisesindicating a pixel location of a starting pixel at a top of the at leastone video image of the two or more video images.
 8. The computerimplemented method of claim 1, wherein separating the two or more videoimages comprises cropping the two or more video images down to one videoimage.
 9. The computer implemented method of claim 1, wherein separatingthe two or more video images comprises storing the two or more videoimages in separate locations of a memory.
 10. A method, comprising:receiving video images from a plurality of video conferencing endpointscommunicatively coupled to an MCU; generating a composite video imagecomprised of at least two video images from respective videoconferencing endpoints of the plurality of video conferencing endpoints;transmitting the composite video image to at least one videoconferencing endpoint; and transmitting coordinate information to the atleast one video conferencing endpoint, wherein the coordinateinformation includes information on a location of a video image of theat least two video images within the composite video image; wherein theat least one video conferencing endpoint is operable to separate atleast one video image from the composite video image using thecoordinate information.
 11. The method of claim 10, wherein thecoordinate information is comprised in metadata transmitted with a videoframe comprising the composite video image.
 12. The method of claim 10,further comprising receiving an image layout preference from the atleast one video conferencing endpoint.
 13. The method of claim 12,wherein the composite video image is generated according to the receivedimage layout preference.
 14. The method of claim 10, wherein the atleast one video conferencing endpoint is operable to separate the videoimages by separating the video images at their respective boundaries,wherein the boundaries are identified through the coordinate informationreceived with the composite video image.
 15. The method of claim 10,wherein the coordinate information indicates a top boundary, a leftboundary, a right boundary, and a bottom boundary of at least one videoimage of the two or more video images.
 16. The method of claim 15,wherein indicating a top boundary comprises indicating a pixel locationof a starting pixel at a top of the at least one video image of the twoor more video images.
 17. The method of claim 10, wherein the at leastone video conferencing endpoint is operable to separate the video imagesby cropping the two or more video images down to one video image. 18.The method of claim 10, wherein the at least one video conferencingendpoint is operable to separate the video images by storing the two ormore video images in separate locations of a memory.
 19. A system,comprising: a processor; a memory coupled to the processor andconfigured to store program instructions executable by the processor to:receive a video frame comprising two or more video images, wherein atleast two of the two or more video images are from different videoconferencing endpoints; and separate the two or more video images intoseparate video images.
 20. The system of claim 19, wherein separatingthe two or more video images is performed by separating the video imagesat their respective boundaries, wherein the boundaries are identifiedthrough coordinate information received with the video frame.